Dependency-structure Annotation to Corpus of Spontaneous Japanese
نویسندگان
چکیده
In Japanese, syntactic structure of a sentence is generally represented by the relationship between phrasal units, or bunsetsus in Japanese, based on a dependency grammar. In the same way, the syntactic structure of a sentence in a large, spontaneous, Japanese-speech corpus, the Corpus of Spontaneous Japanese (CSJ), is represented by dependency relationships between bunsetsus. This paper describes the criteria and definitions of dependency relationships between bunsetsus in the CSJ. The dependency structure of the CSJ is investigated, and the difference in the dependency structures of written text and spontaneous speech is discussed in terms of the dependency accuracies obtained by using a corpus-based model. It is shown that the accuracy of automatic dependency-structure analysis can be improved if characteristic phenomena of spontaneous speech — such as self-corrections, basic utterance units in spontaneous speech, and bunsetsus that have no modifiee — are detected and used for dependency-structure analysis.
منابع مشابه
Word-level Dependency-structure Annotation to Corpus of Spontaneous Japanese and its Application
In Japanese, the syntactic structure of a sentence is generally represented by the relationship between phrasal units, bunsetsus in Japanese, based on a dependency grammar. In many cases, the syntactic structure of a bunsetsu is not considered in syntactic structure annotation. This paper gives the criteria and definitions of dependency relationships between words in a bunsetsu and their applic...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملDependency structure analysis and sentence boundary detection in spontaneous Japanese
This paper addresses automatic detection of dependencies between Japanese phrasal units called bunsetsus, and sentence boundaries in a spontaneous speech corpus. In spontaneous speech, the biggest problem with dependency structure analysis is that sentence boundaries are ambiguous. In this paper, we propose two methods for improving the accuracy of sentence boundary detection in spontaneous Jap...
متن کاملA Japanese Word Dependency Corpus
In this paper, we present a corpus annotated with dependency relationships in Japanese. It contains about 30 thousand sentences in various domains. Six domains in Balanced Corpus of Contemporary Written Japanese have part-of-speech and pronunciation annotation as well. Dictionary example sentences have pronunciation annotation and cover basic vocabulary in Japanese with English sentence equival...
متن کاملStochastic Dependency Parsing of Spontaneous Japanese Spoken Language
This paper describes the characteristic features of dependency structures of Japanese spoken language by investigating a spoken dialogue corpus, and proposes a stochastic approach to dependency parsing. The method can robustly cope with inversion phenomena and bunsetsus which don’t have the head bunsetsu by relaxing the syntactic dependency constraints. The method acquires in advance the probab...
متن کامل